0. Before we start…

…install all the packages that we will need. (if RStudio want to restart the session, just tell it ‘No’)

packages_needed <- c("readr", "dplyr", "ggplot2", "ggridges", "ggthemes", "gapminder", "ggraph", "tidygraph", "eurostat", "maps")

install.packages(packages_needed)

1. Data visualisation principles

Minimize noise, maximize signal in your graphs (or put it in other ways: maximize the data-ink ratio):

source: Darkhorse Analytics

  • avoid chart junk
  • Choose the type of plot depending on the type of data
  • label chart elements properly and informatively
  • ideally both x and y axis starts at 0 (scales can be really deceiving otherwise)
  • use consistent units! (do not mix yearly and month GDP for example)
  • ABSOLUTELY NO 3D PIE CHARTS. (When someone does 3D pie charts God makes a kitten cry.)

Me, seeing 3D charts (I am trigerred equally no matter the sub genre):

Resources:

2. ggplot2 and its extensions

The name stands for grammar of graphics and it enables you to build your plot layer by layer and having the ability to control every detail of the output (if you so wish). It is used by many in academia, by the Financial Times and FiveThirtyEight writers, among many others. During this workshop we will go through various types of data visualisations and try to apply the above set principles to our output.

You create plots with the below syntax:

# loading and shaping data
library(readr)
library(dplyr)

# data sources
library(gapminder)
library(eurostat)
library(maps)

# general data visualisation
library(ggplot2)
library(ggridges)
library(ggthemes)

# network related packages
library(ggraph)
library(tidygraph)
# data
iris_df <- iris

tarantino_538 <- read_csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/tarantino/tarantino.csv")

gapminder_df <- gapminder

oecd_sum <- read_csv("https://raw.githubusercontent.com/aakosm/perg_ggplot2_workshop18/master/data/oecd_sum.csv")
#> Warning: Missing column names filled in: 'X1' [1]

stocks <- read_csv("https://raw.githubusercontent.com/aakosm/perg_ggplot2_workshop18/master/data/stock_data.csv")

2.0 building our first ggplot

Let’s create the foundation of our plot by specifying for ggplot the data we use and the variable we want to plot.

p_hist <- ggplot(data = gapminder_df,
                 mapping = aes(x = gdpPercap))

# what happens if we just use this?
p_hist

We need to specify what sort of shape we want our data to be displayed. We can do this by adding the geom_histogram() function with a +

p_hist + 
    geom_histogram()

Looks a little bit skewed. Let’s log transform our variable with the scale_x_log10() function.

p_hist + 
    geom_histogram() +
    scale_x_log10()
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

As the message says, we can mess around with the binwidth argument, so let’s do that.

p_hist + 
    geom_histogram(binwidth = 0.05) +
    scale_x_log10()

2.1 scatter plot (geom_point())

We use scatter plot to illustrate some association between two continuous variable. Usually, the y axis is our dependent variable (the variable which is explained) and x is the independent variable, which we suspect that drives the association.

Now, we want to know what is the association between the gdp per capita and life expectancy

ggplot(gapminder_df,
             mapping = aes(x = gdpPercap,
                           y = lifeExp)) +
    geom_point()

Now that we have a basic figure, let’s make it better. We transform the x axis values with the scale_x_log10() and add text to our plot with the labs() function. Within geom_point() we can also specify geom specific options, such as the alpha level (transparency).

ggplot(data = gapminder_df,
             mapping = aes(x = gdpPercap,
                           y = lifeExp)) +
    geom_point(alpha = 0.25) + # inside the geom_ we can modify its attributes. Here we set the transparency levels of the points
    scale_x_log10() + # rescale our x axis
    labs(x = "GDP per capita", 
         y = "Life expectancy",
         title = "Connection between GDP and Life expectancy",
         subtitle = "Points are country-years",
         caption = "Source: Gapminder")

To add some analytical power to our plot we can use geom_smooth() and choose a method for it’s smoothing function. It can be lm, glm, gam, loess, and rlm. We will use the linear model (“lm”). Note: this is purely for illustrative purposes, as our data points are country-years, so “lm” is not a proper way to fit a regression line to this data.

ggplot(data = gapminder_df,
             mapping = aes(x = gdpPercap,
                           y = lifeExp)) +
    geom_point(alpha = 0.25) + 
    scale_x_log10() +
    geom_smooth(method = "lm", se = TRUE, color = "orange") + # adding the regressiom line
    labs(x = "GDP per capita", 
         y = "Life expectancy",
         title = "Connection between GDP and Life expectancy",
         subtitle = "Points are country-years",
         caption = "Source: Gapminder")

what if we want to see how each continent fares in this relationship? We need to include a new argument in the mapping function: color =. Now it is clear that European countries (country-years) are clustered in the high-GDP/high life longevity upper right corner.

ggplot(data = gapminder_df,
             mapping = aes(x = gdpPercap,
                           y = lifeExp,
                           color = continent)) + # color by category
    geom_point(alpha = 0.5) + 
    scale_x_log10() + # rescale our x axis
    labs(x = "GDP per capita", 
         y = "Life expectancy",
         title = "Connection between GDP and Life expectancy",
         subtitle = "Points are country-years",
         caption = "Source: Gapminder")

We add horizontal line or vertical line to our plot, if we have a particular cutoff that we want to show. We can add these with the geom_hline() and geom_vline() functions.

ggplot(data = gapminder_df,
             mapping = aes(x = gdpPercap,
                           y = lifeExp,
                           color = continent)) + # color by category
    geom_point(alpha = 0.5) + 
    scale_x_log10() +
    geom_vline(xintercept = 3500) + # adding vertical line 
    geom_hline(yintercept = 70, linetype = "dashed", color = "black", size = 1) + # adding horizontal line
    
    labs(x = "GDP per capita", 
         y = "Life expectancy",
         title = "Connection between GDP and Life expectancy",
         subtitle = "Points are country-years",
         caption = "Source: Gapminder")

2.2 histogram

Using histograms to check the distribution of your data as we have seen in the intro.

ggplot(gapminder_df,
       mapping = aes(x = lifeExp)) +
    geom_histogram() 

ggplot(gapminder_df,
       mapping = aes(x = lifeExp)) +
    geom_histogram(binwidth = 1, color = "black", fill = "orange") # we can set the colors and border of the bars and set the binwidth or bins 

We can overlay more than one histogram on each other. See how different iris species have different sepal length distribution.

ggplot(data = iris_df,
       mapping = aes(x = Sepal.Length,
                     fill = Species)) +
    geom_histogram(binwidth = 0.1, position = "identity", alpha = 0.65) # using the position option so we can see all three variables

2.3 density plots

A variation on histograms is called density plots that uses Kernel smoothing (fancy! but in reality is a smoothing function which uses the weighted averages of neighboring data points.)

ggplot(iris_df,
       mapping = aes(x = Sepal.Length)) +
    geom_density()

Add some fill

ggplot(iris_df,
       mapping = aes(x = Sepal.Length)) +
    geom_density(fill = "orange", alpha = 0.3)

Your intutition is correct, we can overlap this with our histogram

ggplot(iris_df,
       mapping = aes(x = Sepal.Length)) +
    geom_histogram(aes(y = ..density..),
                   binwidth = 0.1,
                   fill = "white",
                   color = "black") +# we add this so the y axis is density instead of count.
    geom_density(alpha = 0.25, fill = "orange")

And similarly to the historgram, we can overlay two or more density plot as well.

ggplot(iris_df,
       mapping = aes(x = Sepal.Length,
                     fill = Species)) +
    geom_density(alpha = 0.5)

2.3.1 ridgeline/joyplot

This one is quite spectacular looking and informative. It has a similar function as the overlayed histograms but presents a much clearer data. For this, we need the ggridges package which is a ggplot2 extension.

ggplot(data = iris_df,
       mapping = aes(x = Sepal.Length,
                     y = Species,
                     fill = Species)) +
    geom_density_ridges(scale = 0.8, alpha = 0.5)

2.4 bar charts

We can use the bar charts to visualise categorical data. Let’s prep some data.

tarantino_rip <- tarantino_538 %>% 
    filter(type == "death")

ggplot(data = tarantino_rip,
       aes(x = movie)) +
    geom_bar()

We can use the fill option to map another variable onto our plot. Let’s see how these categories are further divided by the type of event in the movies (profanity or death). By default we get a stacked bar chart.

ggplot(tarantino_538, aes(movie, fill = type)) +
    geom_bar()

we can use the position function in the geom_bar to change this. Another neat trick to make our graph more readable is coord_flip.

ggplot(tarantino_538, aes(movie, fill = type)) +
    geom_bar(position = "dodge") +
    coord_flip()

Let’s make sure that the bars are proportional. For this we can use the y = ..prop.. and group = 1 arguments, so the y axis will be calculated as proportions. The ..prop.. is a temporary variable that has the .. surrounding it so there is no collision with a variable named prop.

ggplot(tarantino_538, aes(movie, fill = type)) +
    geom_bar(position = "dodge",
             aes(y = ..prop.., group = type)) +
    coord_flip()

Maybe it is best to facet by type.

ggplot(tarantino_538, aes(movie, fill = type)) +
    geom_bar(position = "dodge",
             aes(y = ..prop.., group = type)) +
    coord_flip() +
    facet_wrap(~type, ncol = 2)

2.4.1 Lollipop charts

The lollipop chart is a better barchart in a sense that it conveys the same information with better data/ink ratio. It also looks better. (note: some still consider it a gimmick)

For this we will modify a chart from the Data Visualisation textbook

# for the data see the github repository of the workshop

p <- ggplot(data = oecd_sum,
       mapping = aes(x = year, y = diff, color = hi_lo)) 


p + geom_segment(aes(y = 0, x = year, yend = diff, xend = year)) +
    geom_point() +
    theme(legend.position="none") +
    labs(x = NULL, y = "Difference in Years",
       title = "The US Life Expectancy Gap",
       subtitle = "Difference between US and OECD
                   average life expectancies, 1960-2015",
       caption = "Adapted from Kieran Healy: Data Visualisation, fig.4.21 ")
#> Warning: Removed 1 rows containing missing values (geom_segment).
#> Warning: Removed 1 rows containing missing values (geom_point).

2.5 box plot

ggplot(data = iris_df,
       mapping = aes(x = Species,
                     y = Sepal.Length)) +
    geom_boxplot()

We add color coding to our boxplots as well.


ggplot(data = iris_df,
       mapping = aes(x = Species,
                     y = Sepal.Length,
                     fill = Species)) +
    geom_boxplot(alpha = 0.5)

2.6 violin chart

ggplot(data = iris_df,
       mapping = aes(x = Species,
                     y = Sepal.Length)) +
    geom_violin()

2.7 line chart

For this we use data on stock closing prices. As we are now familiar with the ggplot2 syntax, I do not write out all the data = and mapping =.

ggplot(stocks, aes(date, stock_closing, color = company)) +
    geom_line()

Add some refinements.

ggplot(stocks, aes(date, stock_closing, color = company)) +
    geom_line(size = 1) +
    labs(x = "", y = "Prices (USD)",
         title = "Closing daily prices for selected tech stocks",
         subtitle = "Data from 2016-01-10 to 2018-01-10",
         caption = "source: Yahoo Finance")

faceting helps.

ggplot(stocks, aes(date, stock_closing, color = company)) +
    geom_line(size = 1) +
    labs(x = "", y = "Prices (USD)",
         title = "Closing daily prices for selected tech stocks",
         subtitle = "Data from 2016-01-10 to 2018-01-10",
         caption = "source: Yahoo Finance") +
    facet_wrap(~company, nrow = 4)

3. Themes and plot elements

3.1 Themes

In this section we will go over some of the elements that you can modify in order to get an informative and nice looking figure. ggplot2 comes with a number of themes. You can play around the themes that come with ggplot2 and you can also take a look at the ggthemes package, where I included the economist theme. Another notable theme is the hrbthemes package.

Try out a couple to see what they differ in! The ggthemes package has a nice collection of themes to use. The theme presets can be used with the theme_*() function.

ggplot(data = gapminder_df,
             mapping = aes(x = gdpPercap,
                           y = lifeExp)) +
    geom_point(alpha = 0.25) + 
    scale_x_log10() + 
    theme_minimal() # adding our chosen theme

3.2 Plot elements

Of course we can set all elements to suit our need, without using someone else’s theme.

The key plot elements that we will look at are:

  • labels
  • gridlines
  • fonts
  • colors
  • legend
  • axis breaks

Adding labels, title, as we did before.

ggplot(data = gapminder_df,
             mapping = aes(x = gdpPercap,
                           y = lifeExp,
                           color = continent)) +
    geom_point(alpha = 0.5) + 
    scale_x_log10() + 
    labs(x = "GDP per capita", 
         y = "Life expectancy",
         title = "Connection between GDP and Life expectancy",
         subtitle = "Points are country-years",
         caption = "Source: Gapminder",
         color = "Continent") # changing the legend title

Let’s use a different color scale! We can use a color brewer scale (widely used for data visualization).

ggplot(data = gapminder_df,
             mapping = aes(x = gdpPercap,
                           y = lifeExp,
                           color = continent)) +
    geom_point(alpha = 0.5) + 
    scale_x_log10() + 
    scale_color_brewer(name = "Continent", palette = "Set1") # adding the color brewer color scale

Or we can define our own colors:


ggplot(data = gapminder_df,
             mapping = aes(x = gdpPercap,
                           y = lifeExp,
                           color = continent)) +
    geom_point(alpha = 0.5) + 
    scale_x_log10() + 
    scale_color_manual(values=c("red", "blue", "orange", "black", "green")) # adding our manual color scale

To clean up clutter, we will remove the background, and only leave some of the grid behind. We can hide the tickmarks with modifying the theme() function, and setting the axis.ticks to element_blank(). Hiding gridlines also requires some digging in the theme() function with the panel.grid.minor or .major functions. If you want to remove a gridline on a certain axis, you can specify panel.grid.major.x. We can also set the background to nothing. Furthermore, we can define the text attributes as well in our labels.

ggplot(data = gapminder_df,
             mapping = aes(x = gdpPercap,
                           y = lifeExp,
                           color = continent)) +
    geom_point(alpha = 0.5) + 
    scale_x_log10() + 
    theme(axis.ticks = element_blank(), # removing axis ticks
          panel.grid.minor = element_blank(), 
          panel.background = element_blank()) # removing the background

Finally, let’s move the legend around. Or just remove it with theme(legend.position="none"). We also do not need the background of the legend, so remove it with legend.key, and play around with the text elements of the plot with text.


ggplot(data = gapminder_df,
             mapping = aes(x = gdpPercap,
                           y = lifeExp,
                           color = continent)) +
    geom_point(alpha = 0.5) + 
    scale_x_log10() + 
    theme(axis.ticks = element_blank(), # removing axis ticks
          panel.grid.minor = element_blank(), # removing the gridline
          panel.background = element_blank(), # removing the background
          legend.title = element_text(size = 12), # setting the legends text size
          text = element_text(face = "plain", family = "sans"), # setting global text options for our plot
          legend.key=element_blank(),
          legend.position = "bottom")# removing the background

While we are at it, we want to have labels for our data. For this, we’ll create a plot which can exploit this.

What we use is the geom_text to have out labels in the chart.

gapminder <- gapminder %>% 
    filter(year == 2002, continent == "Europe")


ggplot(gapminder, aes(lifeExp, gdpPercap, label = country)) + # we add the labels!
    geom_point() +
    geom_text() # and use the geom text

notice the different outcome of geom_label instead of geom_text.

ggplot(gapminder, aes(lifeExp, gdpPercap, label = country)) + # we add the labels!
    geom_point() +
    geom_label() # and use the geom label

If we want to label a specific set of countries we can do it from inside ggplot, without needing to touch our data.

ggplot(gapminder, aes(lifeExp, gdpPercap, label = country)) + # we add the labels!
    geom_point() +
    geom_text(aes(label = if_else(lifeExp > 80, country, NULL)), nudge_x = 0.5) # we add a conditional within the geom. Note the nudge_x!
#> Warning: Removed 26 rows containing missing values (geom_text).

4. Special cases

4.1 Network visualization

Let’s load our data from an edgelist. We are using the tidygraph ggraph packages, but both are heavily dependent on the igraph package which is one of the most powerful one for network analysis in R.

# data
edges_got <- read_csv("https://raw.githubusercontent.com/melaniewalsh/sample-social-network-datasets/master/sample-datasets/game-of-thrones/got-edges.csv")

let’s create the network object and add some network statistics to our small social network

soc_nw <- as_tbl_graph(edges_got, directed = FALSE) %>% 
  activate(nodes) %>% 
  mutate(centrality = centrality_eigen(), community = as.factor(group_infomap()))

We plot the network with the ggraph() function, that is a network oriented extension of ggplot2. The nodes and links are plotted separately with the geom_edge_* and geom_node_*. In this case link and point.

ggraph(soc_nw, layout = "kk") +
  geom_edge_link(alpha = 0.35) +
  geom_node_point()

alternative with modifications to link and node attributes. Note the theme_graph() at the end.

ggraph(soc_nw, layout = "fr") +
  geom_edge_link(aes(width = Weight), alpha = 0.2) +
  scale_edge_width(range = c(0.5,2)) +
  geom_node_point(aes(size = centrality), alpha = 0.8) +
  theme_graph()
#> Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
#> font family not found in Windows font database

#> Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
#> font family not found in Windows font database

#> Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
#> font family not found in Windows font database

#> Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
#> font family not found in Windows font database

#> Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
#> font family not found in Windows font database

#> Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
#> font family not found in Windows font database

#> Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
#> font family not found in Windows font database

#> Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
#> font family not found in Windows font database

final touch, let’s add the communities in the network and labels for our nodes.

ggraph(soc_nw, layout = "fr") +
  geom_edge_link(aes(width = Weight), alpha = 0.2) +
  scale_edge_width(range = c(0.5,1.5)) +
  geom_node_point(aes(size = centrality, color = community), alpha = 0.8) +
  geom_node_text(aes(label = name), size = 3, repel = TRUE) +
  scale_color_brewer(palette = "Set2") +
  labs(title = "Social network of the Song of Ice and Fire books",
       caption = "Data: <github.com/melaniewalsh/sample-social-network-datasets>") +
  theme_graph()
#> Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
#> font family not found in Windows font database

#> Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
#> font family not found in Windows font database

#> Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
#> font family not found in Windows font database

#> Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
#> font family not found in Windows font database

#> Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
#> font family not found in Windows font database

#> Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
#> font family not found in Windows font database

#> Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
#> font family not found in Windows font database

#> Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
#> font family not found in Windows font database

#> Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
#> font family not found in Windows font database

#> Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
#> font family not found in Windows font database

#> Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
#> font family not found in Windows font database

#> Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
#> font family not found in Windows font database

#> Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
#> font family not found in Windows font database

#> Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
#> font family not found in Windows font database

#> Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
#> font family not found in Windows font database

#> Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
#> font family not found in Windows font database

#> Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
#> font family not found in Windows font database

#> Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, :
#> font family not found in Windows font database

4.2 Maps

Two essential parts of creating a map with ggplot2: - shapefile which draws the map - some data that we want to plot over the map

Getting the map data from the maps package

world <- map_data("world")

We can plot the empty map

ggplot(data = world,
       mapping = aes(x = long, y = lat, group = group)) +
  geom_polygon(fill = "white", color = "black") +
  coord_cartesian()

We can also subset the map data, just as we can with any other R object

# subsetting the world data
world_subset <- world[world$region == "France",]

ggplot(data = world_subset,
       mapping = aes(x = long, y = lat, group = group)) +
  geom_polygon(fill = "white", color = "black") +
  coord_cartesian()

adding data to our map. If you are not familiar with the dplyr package, it is one of the better data wrangling solutions out there. We subset our gapminder data for the year 1977. Then add a new row that matches the region variable in the map data so we can merge the two dataset. (we also get rid of Antarctica, because of aesthetics)

year_2000 <- gapminder_df %>% 
  filter(year == 1977) %>% 
  mutate(country = as.character(country))

# adding key variable for merge
year_2000$region <- year_2000$country

# merging the data and the map and getting rid of antarctica
world_data <- left_join(world, year_2000, by = "region") %>% 
  filter(region != "Antarctica")

And now we can plot the map and data with the geom_polygon() and coord_quickmap(). I also made some modifications to the theme, so it looks better.

ggplot(world_data, aes(long, lat, group = group, fill = lifeExp)) +
  geom_polygon(color = "gray90", size = 0.05 ) +
  coord_quickmap() +
  labs(fill = "Life expectancy",
       title = "Life expectancy around the world",
       subtitle = "1977",
       caption = "Data: Gapminder") +
  scale_fill_viridis_c(na.value = "white", direction = -1) +
  theme_bw() +
  theme(axis.line=element_blank(),
        axis.text=element_blank(),
        axis.ticks=element_blank(),
        axis.title=element_blank(),
        panel.background=element_blank(),
        panel.border=element_blank(),
        panel.grid=element_blank(),
        panel.spacing=unit(0, "lines"),
        plot.background=element_blank(),
        legend.justification = c(0,0),
        legend.position = c(0,0)
  )

4.2.1 example from the Eurostat package

The vignette contains the full tutorial on how to use the eurostat package to get data through the eurostat API. If you are interested check it out later.

We get the data, which is disposable household income data for NUTS2 regions. We merge it with the shapefile, that we have in the geodata object.

dat <- get_eurostat("tgs00026", time_format = "raw", stringsAsFactors = FALSE) %>% 
  # subsetting to year 2014 and NUTS-2 level
  dplyr::filter(time == 2014, nchar(geo) == 4) %>% 
  # classifying the values the variable
  dplyr::mutate(cat = cut_to_classes(values))


# Download geospatial data from GISCO
geodata <- get_eurostat_geospatial(resolution = "60", nuts_level = "2")

# merge with attribute data with geodat
map_data <- inner_join(geodata, dat)

plotting with the coord_sf

ggplot(data=map_data) + 
  geom_sf(aes(fill=cat),color="dim grey", size=.1) + 
  scale_fill_brewer(palette = "Oranges") +
  guides(fill = guide_legend(reverse=T, title = "euro")) +
  labs(title="Disposable household income in 2014",
       caption="(C) EuroGeographics for the administrative boundaries 
       Map produced in R with a help from Eurostat-package <github.com/ropengov/eurostat/>") +
  theme_light() + 
  theme(legend.position=c(.8,.8)) +
  coord_sf(xlim=c(-12,44), ylim=c(35,70))

If we want to add arbitrary points to the map, we can do that by specifying the longitudinal and latitudinal coordinates.

long <- c(10)
lat <- c(50)
my_city <- as.data.frame(cbind(long,lat))

Then just plot over our map with the geom_point()

ggplot(data=map_data) + 
  geom_sf(fill= "white", color="dim grey", size=.1) + 
  geom_point(data = my_city, aes(x = long, y = lat), color = "orange", size = 5, alpha = 0.7) +
  theme_light() +
  coord_sf(xlim=c(-12,44), ylim=c(35,70))